We were curious if LLMs are robust under various typo rates.
Hmm, couple thoughts, i don’t think my write up is perfect, but how about something more like this:
> We were curious if LLMs produce the same response when there are typos in the prompt. To test this, we injected typos into the prompts from BigCodeBench and ran different Claude models. We found that while the accuracy for Opus gradually declined with typo rate, for Haiku, its accuracy actually increased as typo rate increased. This blog investigates this counter-intuitve phenomena.
(points I care about: 1. “robust under various typo rates” doesn’t sound like a linear increase in typos, which is what we actually do.
2. “We double-checked our code and asked Claude to do so too, but we couldn’t find any bugs”—this should be assumed for all your work.
3. “The mystery begins.”—this format does sound quite AI generated
Hmm, couple thoughts, i don’t think my write up is perfect, but how about something more like this:
> We were curious if LLMs produce the same response when there are typos in the prompt. To test this, we injected typos into the prompts from BigCodeBench and ran different Claude models. We found that while the accuracy for Opus gradually declined with typo rate, for Haiku, its accuracy actually increased as typo rate increased. This blog investigates this counter-intuitve phenomena.
(points I care about: 1. “robust under various typo rates” doesn’t sound like a linear increase in typos, which is what we actually do.
2. “We double-checked our code and asked Claude to do so too, but we couldn’t find any bugs”—this should be assumed for all your work.
3. “The mystery begins.”—this format does sound quite AI generated
)